SQL Server 2008 Analysis Services : An Analytics Design Methodology

12/13/2010 8:50:56 AM

A data warehouse can be built from the top down or from the bottom up. To build a top-down warehouse, you need to form a complete picture or logical data model for the entire organization (or all the subsystems within the scope of the project, such as all financial systems). In contrast, building a warehouse from the bottom up takes a much more departmental or specific business-area focus (for example, a sales order system only). This breaks the task of modeling the data into more manageable chunks. Such a departmental approach produces data marts that are potentially subsets of the overall data warehouse. The bottom-up approach can simplify implementation. It helps get departmental or business-area information to the people who need it, makes it easier to protect sensitive data, and results in better query response times because data marts deal with less data than a voluminous transactional system. The potential risk in the data mart approach is that disparity in data mart implementation can result in a logically disjointed enterprise data warehouse if efforts aren’t carefully coordinated across the organization.

Before you embark on an OLAP database creation effort, the time you spend understanding the underlying requirements is the best time you can give your effort. If scope is set correctly, you will be able to achieve an industrial-strength OLAP design without much difficulty. First, you need to take care of some groundwork:

1.	Carefully assess the scope of what you want to represent in the BI environment. Start small, as the bottom-up approach suggests. For instance, just tackle the sales data facts.
2.	Coordinate your efforts with other related BI efforts. Let people know that you are carving out a specific subject area or departmental data and, when you finish, publish your design to everyone.
3.	Seek out any shared dimensions that might have already been created for other cubes. You want to leverage these as much as possible for the sake of data consistency and nonredundant processing.
4.	Understand your data sources. The OLAP cube you create will be only as good as the data you put into it. It’s best to understand the dirty data issues of what you are about to touch long before you try to build an OLAP cube with it.

An Analytics Mini-Methodology

To successfully build OLAP solutions, you are advised to carefully assess the requirements of your end users in as detailed fashion as is possible. A mini-methodology that focuses on the essential usages and characteristics of an Analytic solution can prove invaluable. The following sections outline a solid approach to nailing down your BI requirements and yielding optimal OLAP designs that solve your end users’ needs.

Assumption: You are building a business area–focused OLAP cube.

Requirements Phase

1.	Identify the processing requirements for this DSS. What analysis do you need to do? Are trend reporting, forecasting, and so on necessary? These can often be represented in use case form (via UML). Ask each user what business decision questions he or she needs to have answered. Ask each user how often he or she needs these questions answered and exactly when the questions must be answered. Ask each user how current the data must be to get accurate answers. (This speaks to data latency.)
2.	Identify the data needed to fulfill these requirements. What data must be touched to provide answers? The best way to capture this type of information is a logical data model. Even a rough model is better than none at all. This is the point where you focus on the facts that need to be analyzed.
3.	Identify all possible hierarchies and level representations (that is, aggregations). This is how the data is used. Most users are likely to tell you that they want to see product data in the product hierarchy structure that has already been set up (for example, product family, product groups).
4.	Identify the time hierarchies that the users need. Because time is usually implicit, it just needs to be clarified in terms of levels of aggregation (for example, years, quarters, months, weeks, days) and whether it needs to be fiscal versus Gregorian calendar, both, or something else.
5.	Understand the data that each user can view from a security point of view.

Design Phase

1.	Analyze which data sources are needed to fulfill the requirements. See whether dimensions or OLAP cubes that already exist can be shared.
2.	Understand what data transformations need to be done to the source data to provide it to the OLAP world. This might include pre-aggregation, reformatting, data integrity verifications, and so on.
3.	Translate these requirements into an OLAP model design: Translate to MOLAP if your data sources are not going to be leveraged at all and you will be taking full advantage of OLAP storage. Translate to ROLAP if you are going to leverage an existing relational design and storage. Translate to HOLAP if you are going to partially utilize the source data storage and partially utilize OLAP storage. This is the most frequently used approach.

Construction Phase

1.	Implement data extraction, transformation, and loading (ETL) logic (via T-SQL, SSIS, or other methods).
2.	Create the data sources to be used.
3.	Create the dimensions.
4.	Create the cube.
5.	Select data measures (that is, the data facts) for the cube.
6.	Design the storage and aggregations.
7.	Process the cube. This brings the data into the OLAP environment.
8.	Verify data integrity.

Implementation Phase

1.	Define the security roles in the cube.
2.	Train the user to use the system.
3.	Process the data into the OLAP environment (from production data sources).
4.	Verify data integrity.
5.	Allow users to use the OLAP cube.

Maintenance Phase

1.	Evaluate access optimization in the OLAP cube via usage analysis.
2.	Do data mining discovery, if desired.
3.	Make schema changes/enhancements, as necessary.